Attribute Reliability in Cognitive Diagnostic Assessment
نویسندگان
چکیده
The attribute hierarchy method is a psychometric procedure for classifying examinees’ test item responses into a set of structured attribute patterns associated with different components from a cognitive model of task performance. Results from an AHM analysis yield information on examinees’ cognitive strengths and weaknesses. Hence, the AHM can be used for cognitive diagnostic assessment. The purpose of this study is to introduce and evaluate a new concept for assessing attribute reliability using the ratio of true score variance to observed score variance on items that probe specific cognitive attributes. This reliability procedure is evaluated and illustrated using both simulated data and student response data from a sample of algebra items taken from the March 2005 administration of the SAT. The reliability of diagnostic scores and the implications for practice are also discussed. Acknowledgements The research reported in this study was conducted, in part, with funds provided to the second author by the College Entrance Examination Board. We would like to thank the College Entrance Examination Board for their support. However, the authors are solely responsible for the ideas, methods, procedures, and interpretations expressed in this study. Our views do not necessarily reflect those of the College Entrance Examination Board. Attribute Reliability 3 Cognitive diagnostic assessment (CDA) is a form of testing that employs a cognitive model to, first, develop or identify items that measure specific knowledge and skills and, then, use this model to direct the psychometric analyses of the examinees’ item response patterns to promote specific test score inferences. CDAs are designed to measure these specific knowledge structures and processing skills in order to provide examinees with information about their cognitive strengths and weaknesses. A cognitive model in educational measurement refers to a “simplified description of human problem solving on standardized educational tasks, which helps to characterize the knowledge and skills students at different levels of learning have acquired and to facilitate the explanation and prediction of students’ performance” (Leighton & Gierl, 2007, p. 6). Cognitive models are generated by studying the knowledge, processes, and strategies used by examinees as they respond to items. One benefit of developing or identifying items and analyzing data according to a cognitive model stems from the detailed information that can be obtained about the knowledge and skills that examinees’ actually use to solve test items. In fact, some cognitive psychologists are now urging educational measurement specialists to develop assessment procedures using cognitive models. Pellegrino, Baxter, and Glaser (1999) claimed that: ...it is the pattern of performance over a set of items or tasks explicitly constructed to discriminate between alternative profiles of knowledge that should be the focus of assessment. The latter can be used to determine the level of a given student’s understanding and competence within a subjectmatter domain. Such information is interpretative and diagnostic, highly informative, and potentially prescriptive. (p.335) In short, CDAs have the potential for identifying examinees’ problem-solving strengths and weaknesses, particularly when the assessments are created from cognitive models that provide a contemporary representation of the knowledge structures and processing skills that are believed to underlie conceptual understanding in a particular domain. CDA results could also be integrated into the teaching and learning process because this form of assessment supports specific inferences about the examinees’ problem-solving skills that could be linked with specific instructional methods designed to improve these cognitive skills. Attribute Reliability 4 In an attempt to uncover the diagnostic information that may be embedded in examinees’ item response data and to address the challenge posed by Pellegrino et al. (1999), psychometric procedures have been developed to support test score inference based on cognitive models of test performance. These cognitive diagnostic models contain parameters that link item features to the examinees’ response patterns so inferences about declarative, procedural, and strategic knowledge can be made. Some early examples include the rule space model (Tatsuoka, 1983) and the linear logistic test model (Fischer, 1973). More recent examples include the DINA models (de la Torre & Douglas, 2004), the NIDA models (Junker & Sijtsma, 2001), the DINO models (Templin & Henson, 2006), the Fusion model (e.g., Roussos, et al., 2007), and the hierarchical general diagnostic model (von Davier, 2007). In 2004, Leighton, Gierl, and Hunka also introduced a procedure for CDA called the attribute hierarchy method (AHM). The AHM, a method that evolved from Tatsuoka’s rule space model (see Gierl, 2007), is a psychometric procedure for classifying examinees’ item responses into a set of structured attribute patterns associated with different components from a cognitive model of task performance. Attributes include different procedures, skills, and/or processes that an examinee must possess to solve an item. These attributes are structured using a hierarchy so the ordering of the cognitive skills is specified. As a result, the attribute hierarchy serves as an explicit construct-centered cognitive model. This model, in turn, provides a framework for designing test items and for linking examinees’ test performance to specific inferences about psychological skill acquisition. AHM developments have been documented in the educational and psychological measurement literature, including psychometric advances (e.g., Leighton et al., 2004; Gierl, Leighton, & Hunka, 2007; Gierl, Cui, & Hunka, in press; Cui, Leighton, Gierl, & Hunka, 2006) and practical applications (e.g., Gierl, Wang, & Zhou, 2008; Wang & Gierl, 2007). The AHM has also been used to study differential item functioning (Gierl, Zheng, & Cui, 2008) and to service diagnostic adaptive testing (Gierl & Zhou, 2008). To-date, however, the AHM has not been applied in an operational diagnostic testing situation because the reliability for attribute-based scoring must be established. Attribute reliability is a fundamental concept in CDA because score reports must provide users with a comprehensive yet Attribute Reliability 5 succinct summary of the outcomes from testing, including score precision. The authors of the Standards for Educational and Psychological Testing (1999) make this point clear when they state in Standard 5.10: When test score information is released to students, parents, legal representatives, teachers, clients, or the media, those responsible for testing programs should provide appropriate interpretations. The interpretations should describe in simple language what the test covers, what scores mean, the precision of the scores, common misinterpretations of test scores, and how scores will be used. Addressing what the test covers, what scores mean, common misinterpretations, and how scores are used is, largely, a descriptive process that is specific to a particular testing situation. Conversely, addressing score precision is a more analytic process that generalizes across testing situations. Hence, the purpose of this paper is to introduce and evaluate an analytic procedure for assessing attribute reliability. Overview of Attribute Hierarchy Method The AHM (Leighton et al., 2004) is a psychometric method for classifying examinees’ item responses into a set of structured attribute patterns associated with different components from a cognitive model of task performance. An attribute is a description of the procedural or declarative knowledge needed to perform a task in a specific domain. These attributes form a hierarchy that define the psychological ordering among the attributes required to solve a test item. The examinee must possess these attributes to answer items correctly. The attribute hierarchy serves as a cognitive model of task performance which, in educational measurement, refers to a simplified description of human problem solving on standardized tasks at some convenient grain size or level of detail in order to facilitate explanation and prediction of students’ performance, including their strengths and weaknesses (Leighton & Gierl, 2007). These models provide an interpretative framework that can guide item development so test performance can be linked to specific inferences about examinees’ cognitive skills. An AHM analysis is often conducted as a twostage process, where the cognitive model is developed first, and then the examinee response data are classified using statistical pattern recognition techniques to produce attribute probability estimates. Stage 1: Cognitive Model Development Attribute Reliability 6 The purpose of the first stage is to generate the expected examinee response patterns for a specific attribute hierarchy. A sample hierarchy is presented in Figure 1. This example is used in Leighton at al. (2004) and it will also be used as one attribute hierarchy in our simulation study. The hierarchy contains two divergent branches, but with a common prerequisite of attribute 1. In the first branch, attribute 2 is prerequisite to attribute 3. In the second branch, attribute 4 is prerequisite to attributes 5 and 6. A formal representation is used where the adjacency, reachability, incidence, reduced incidence, and expected response matrices are specified (cf. Tatsuoka, 1983, 1990, 1991, 1995). A binary adjacency matrix ( ) of order k by k , where k is the number of attributes, specifies the direct relationships among attributes. Then, a reachability matrix ( ) of order k by k , where k is the number of attributes, specifies the direct and indirect relationships among attributes. The matrix is calculated using , where n is the integer required for to reach invariance and can represent the numbers 1 through , given , the adjacency matrix, and , an identity matrix. The incidence matrix ( ) of order k by p where k is the number of attributes and is the number of potential items, is produced next. The set of potential items is considered a bank or pool of items that probes all combinations of attributes when the attributes are dependent and independent. The columns of the matrix are created by converting the integers ranging from 1 to 2-1 to their binary form. ( n R A I = + )
منابع مشابه
A New multi attribute Decision making Reliability Centered Maintenance in Power Transmission Systems
The present context of the electric industry, characterized by competitive markets, privatization, and regulatory of technical requirements forces the power utilities to optimize their asset management practices and develop the requisite decision plans techno-economically. Practically approaching, this paper devises a new support tool based on a multiattribute decision making (MADM) framework i...
متن کاملConstructing and Validating a Q-Matrix for Cognitive Diagnostic Analysis of a Reading Comprehension Test Battery
Of paramount importance in the study of cognitive diagnostic assessment (CDA) is the absence of tests developed for small-scale diagnostic purposes. Currently, much of the research carried out has been mainly on large-scale tests, e.g., TOEFL, MELAB, IELTS, etc. Even so, formative language assessment with a focus on informing instruction and engaging in identification of student’s strengths and...
متن کاملComputerized Attribute-Adaptive Testing: A New Computerized Adaptive Testing Approach Incorporating Cognitive Psychology
Modern computer technology has accelerated computerized adaptive testing implementation. However, few operational computer-based tests consider the underlying cognitive psychology of testing, either in the test development or in the diagnostic feedback provided to students. A concept of computerized attribute-adaptive testing (CA-AT), which integrates computerized adaptive testing with an attri...
متن کاملSelecting the Best Fit Model in Cognitive Diagnostic Assessment: Differential Item Functioning Detection in the Reading Comprehension of the PhD Nationwide Admission Test
This study was an attemptto provide detailed information of the strengths and weaknesses of test takers‟ real ability through cognitive diagnostic assessment, and to detect differential item functioning in each test item. The rationale for using CDA was that it estimates an item‟s discrimination power, whereas clas- sical test theory or item response theory depicts between rather within item mu...
متن کاملDevelopment and Standardization of a New Cognitive Assessment Test Battery for Chinese Aphasic Patients: A Preliminary Study
BACKGROUND Nonlinguistic cognitive impairment has become an important issue for aphasic patients, but currently there are few neuropsychological cognitive assessment tests for it. To get more information on cognitive impairment of aphasic patients, this study aimed to develop a new cognitive assessment test battery for aphasic patients, the Non-language-based Cognitive Assessment (NLCA), and ev...
متن کامل